A-rsibal_PeerCommentary_Amalik03_03
Some of my best friends are Zombies…
Reflection
I enjoyed most of this homework, but some of it was a struggle. I really enjoyed trying to figure out how to make my own functions despite how frustrating it was. I struggled with question 6 because I misread the question and was trying to plot at 100 X 30 data points into a vector of vectors but realized I only needed the means of each random sample. I also struggled with finding a way to add to vectors. I used the internet and found the function “append” to make changes to a vector. Finally I needed to refresh my memory on how to analyze qq plots because I didn’t know how to use the graph. I reread Module 8 to grab a better understanding of quantile-quantile plots and its uses.I didn’t use the full capabilities of ggplot2 because I wanted to make sure I did the homework correctly. I plan on adding color and prettiness for the final submission.
1. Calculate the population mean and standard deviation for each quantitative random variable (height, weight, age, number of zombies killed, and years of education).
Ritika’s comments: I also used the curl function! I really liked how you organised your code here, specifically since you used “mean.height…”, it made it very easy to read and follow! great job!
## Using libcurl 7.88.1 with LibreSSL/3.3.6
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ readr::parse_date() masks curl::parse_date()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
f <-curl("https://raw.githubusercontent.com/fuzzyatelin/fuzzyatelin.github.io/master/AN588_Fall23/zombies.csv")
d <-read.csv(f, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(d) #make sure it's set up right## id first_name last_name gender height weight zombies_killed
## 1 1 Sarah Little Female 62.88951 132.0872 2
## 2 2 Mark Duncan Male 67.80277 146.3753 5
## 3 3 Brandon Perez Male 72.12908 152.9370 1
## 4 4 Roger Coleman Male 66.78484 129.7418 5
## 5 5 Tammy Powell Female 64.71832 132.4265 4
## 6 6 Anthony Green Male 71.24326 152.5246 1
## years_of_education major age
## 1 1 medicine/nursing 17.64275
## 2 3 criminal justice administration 22.58951
## 3 1 education 21.91276
## 4 6 energy studies 18.19058
## 5 3 logistics 21.10399
## 6 4 energy studies 21.48355
## [1] 67.6301
## [1] 143.9075
## [1] 143.9075
## [1] 2.992
## [1] 2.996
pop.sd <- function(x){ #create a function for population SD
sqrt(
sum(
(x - mean(x))^2
)
/ length(x)
)
}
(height.sd <- pop.sd(d$height))## [1] 4.30797
## [1] 18.39186
## [1] 2.963583
## [1] 1.747551
## [1] 1.675704
2. Use {ggplot} to make boxplots of each of these variables by gender.
Ritika’s comments: Definitely not necessary here but as a suggestion you could also make the plotting into a function and pass in the x and y variables as input! But I think this section was very clear–great job!
heightplot <- ggplot(data = d, aes(x = gender, y = height)) + geom_boxplot() + xlab("Gender") + ylab("Height(inches)")
heightplotweightplot <- ggplot(data = d, aes(x = gender, y = weight)) + geom_boxplot() + xlab("Gender") + ylab("Weight(pounds)")
weightplotageplot <- ggplot(data = d, aes(x = gender, y = age)) + geom_boxplot() + xlab("Gender") + ylab("Age(years)")
ageplotkillsplot <- ggplot(data = d, aes(x = gender, y = zombies_killed)) + geom_boxplot() + xlab("Gender") + ylab("Zombies Killed")
killsploteduplot <- ggplot(data = d, aes(x = gender, y = years_of_education)) + geom_boxplot() + xlab("Gender") + ylab("Years of Education")
eduplot4. Using histograms and Q-Q plots, check whether the quantitative variables seem to be drawn from a normal distribution. Which seem to be and which do not (hint: not all are drawn from the normal distribution)? For those that are not normal, can you determine from which common distribution they are drawn?
Ritika’s comments: I liked your graph organisation here (both the hist and qq plots in same image) and that you used qqline in addition to qqnorm–i did the same thing! I actually said my graphs were left skewed but after reading your comments and doing a quick Google search I am now realizing that I am wrong! These are definitely right skewed graphs, so thank you!
par(mfrow = c(1,2))
hist(d$height) #Plots a basic histogram
qqnorm(d$height, frame = FALSE) #Plots a qqgraph of data points
qqline(d$height) #plots a qq line
Yes, based on the histogram and the qqplot, the height of the
survivors are drawn from a normal distribution curve.
Yes, based on the histogram and the qqplot, the weight of the
survivors are drawn from a normal distribution curve.
Yes, based on the histogram and the qqplot, the age of the survivors
are drawn from a normal distribution curve.
par(mfrow = c(1,2))
hist(d$zombies_killed)
qqnorm(d$zombies_killed, frame = FALSE)
qqline(d$zombies_killed)
No, based on the graphs, the data does not stem from a normal
distributions. This seems like the zombies killed by survivors data is
drawn from a right-skewed distribution.
par(mfrow = c(1,2))
hist(d$years_of_education)
qqnorm(d$years_of_education, frame = FALSE)
qqline(d$years_of_education)
No, the data for survivors’ years of education is not drawn from a
normal disrtribution. The data is looks heavily right skewed.
5. Now use the sample() function to sample ONE subset of 30 zombie survivors (without replacement) from this population and calculate the mean and sample standard deviation for each variable. Also estimate the standard error for each variable, and construct the 95% confidence interval for each mean. Note that for the variables that are not drawn from the normal distribution, you may need to base your estimate of the CIs on slightly different code than for the normal…
Ritika’s comments: cool use of sample_n–I did not know that existed and ended up downloading a separate package! One thing I think you might have missed here is that the first 3 graphs are normally distributed so we probably want to use qnorm instead of qt to calculate the CIs (i think this was in Mod 9), and then keep qt for the not normally distributed ones (last 2)
## id first_name last_name gender height weight zombies_killed
## 1 36 Walter Carpenter Male 70.90143 142.7320 3
## 2 965 Samuel Hamilton Male 74.16976 163.0445 3
## 3 839 Ryan Barnes Male 68.23992 128.3578 2
## 4 343 Beverly Phillips Female 67.03964 138.8831 1
## 5 126 Eric Young Male 70.44371 171.2002 4
## 6 819 Harold Patterson Male 70.75748 152.3995 3
## 7 776 Shirley Jones Female 63.16838 133.4696 3
## 8 231 Kathleen Wright Female 63.35430 118.5460 2
## 9 937 Raymond Roberts Male 65.37115 147.0078 2
## 10 510 Tina Taylor Female 64.25954 124.1066 1
## 11 540 Norma Anderson Female 62.43483 112.0983 3
## 12 13 Alan Olson Male 66.69701 133.6441 2
## 13 732 Steven Reynolds Male 68.95874 157.7417 1
## 14 436 Jack Marshall Male 67.57963 137.0341 4
## 15 162 Roy Dunn Male 75.27081 167.3627 2
## 16 872 Carol Freeman Female 68.84493 138.4296 3
## 17 933 Amy Willis Female 67.89424 124.2984 5
## 18 224 Tammy Gomez Female 61.35044 120.3426 2
## 19 235 Betty Howard Female 63.54701 145.4172 4
## 20 796 Betty Tucker Female 63.82403 119.6649 2
## 21 472 Annie Thompson Female 66.12076 134.5862 0
## 22 317 Jacqueline Hicks Female 63.00016 131.8478 3
## 23 65 Gary Morris Male 72.98792 154.5452 4
## 24 432 Daniel Campbell Male 73.13875 183.5933 1
## 25 261 Tammy Hart Female 57.63606 115.9745 4
## 26 756 Amanda Rogers Female 60.74299 124.3453 2
## 27 455 Keith Burns Male 71.72562 157.6048 3
## 28 809 Roy Mason Male 69.61105 144.9560 2
## 29 92 Alan Ramos Male 65.04941 149.8804 4
## 30 368 Jane Woods Female 69.90024 155.6157 1
## years_of_education major age
## 1 2 physical education 23.98011
## 2 0 agricultural sciences 23.24972
## 3 1 physical education 19.92937
## 4 2 medicine/nursing 20.93191
## 5 5 mechanical engineering 20.42065
## 6 0 physical education 21.43846
## 7 5 botany 17.85399
## 8 5 business administration 21.93743
## 9 5 culinart services 18.55335
## 10 2 applied sciences 22.34368
## 11 2 education 18.06575
## 12 1 economics 18.59929
## 13 3 medicine/nursing 21.93762
## 14 2 environmental science 19.74716
## 15 1 biology 23.28328
## 16 7 criminal justice administration 24.00616
## 17 4 physical education 22.84620
## 18 3 agricultural sciences 18.31888
## 19 4 economics 16.53939
## 20 2 logistics 21.72521
## 21 2 culinart services 20.51588
## 22 5 applied sciences 16.87172
## 23 0 education 19.81533
## 24 5 agricultural sciences 22.57573
## 25 1 education 12.29306
## 26 2 city planning 16.33405
## 27 1 biology 19.30345
## 28 3 environmental science 21.90370
## 29 1 culinart services 15.18804
## 30 2 architecture 22.19631
## [1] 67.134
## [1] 140.9577
## [1] 20.09016
## [1] 2.533333
## [1] 2.6
## [1] 4.250069
## [1] 17.56421
## [1] 2.763434
## [1] 1.175679
## [1] 1.8
## [1] 0.7759529
## [1] 3.206772
## [1] 0.5045317
## [1] 0.2146487
## [1] 0.3286335
xbar = NULL
sd= NULL
bounds <- function(xbar , sd){
t.score <- qt(p=0.05/2, df = 29,, lower.tail = FALSE)
lower.bound <- xbar - (t.score * sd/sqrt(30))
upper.bound <- xbar + (t.score * sd/sqrt(30))
print(c(lower.bound, upper.bound))
}
bounds(mean.height.s, sd.height.s)## [1] 65.547 68.721
## [1] 134.3991 147.5163
## [1] 19.05828 21.12205
## [1] 2.094327 2.972339
## [1] 1.927869 3.272131
Now draw 99 more random samples of 30 zombie apocalypse survivors, and calculate the mean for each variable for each of these samples. Together with the first sample you drew, you now have a set of 100 means for each variable (each based on 30 observations), which constitutes a sampling distribution for each variable. What are the means and standard deviations of this distribution of means for each variable? How do the standard deviations of means compare to the standard errors estimated in [5]? What do these sampling distributions look like (a graph might help here)? Are they normally distributed? What about for those variables that you concluded were not originally drawn from a normal distribution?
Ritika’s comments: great job here! I think we ended up doing the sampling in almost the exact same way. I’ve been coding for a long time now and I have to say your code is very well thought out and looks like someone who is an experienced coder. I like that you reuse your functions here and that you also added the QQ plots–I only plotted the histograms so I will be definitely adding that. The only suggestion here is that I dont think you need to have par(mfrow = c(1,2)) every time. I think once you set it in a session it should hold its value for the rest of the file (until you change it)
meanheights <- c(mean.height.s)
meanweights <- c(mean.weight.s)
meanages <- c(mean.age.s)
meankills <- c(mean.kills.s)
meanedus <- c(mean.edu.s)
for (x in 1:99) { #create a loop that adds the mean vectors of the quantitative date to a vector of means
rsample <- sample_n(d, size = 30, replace = FALSE)
meanheights <- append(meanheights, mean(rsample$height))
meanweights <- append(meanweights, mean(rsample$weight))
meanages <- append(meanages, mean(rsample$age))
meankills <- append(meankills, mean(rsample$zombies_killed))
meanedus <- append(meanedus, mean(rsample$years_of_education))
}
meanheights## [1] 67.13400 68.27821 67.67389 68.23831 68.05631 68.13375 69.28150 66.39103
## [9] 67.75279 67.68368 66.65270 68.63706 66.04277 67.46130 66.78045 67.92521
## [17] 66.70691 68.54358 68.53892 67.47763 67.95184 69.35587 67.78857 66.22889
## [25] 67.61154 67.43544 67.42596 68.24939 66.44887 66.63638 67.90845 67.94210
## [33] 66.77796 67.98100 67.31928 66.20743 69.79256 68.22401 67.91228 68.64239
## [41] 68.95279 67.22306 69.24500 67.69711 68.07894 66.10750 67.36105 67.41173
## [49] 66.36788 67.68149 67.52382 68.72029 68.28310 68.87110 67.43820 67.51291
## [57] 66.87908 67.02280 67.84456 66.40778 68.59410 68.02518 68.28912 67.82374
## [65] 65.85376 66.84396 67.19964 67.50179 69.17147 68.49748 67.78277 67.73995
## [73] 67.36755 67.64773 69.12397 66.35496 67.77447 67.83693 67.88098 66.81460
## [81] 66.95212 67.62374 68.48978 67.73330 68.03024 67.65567 67.73450 65.93704
## [89] 68.84943 67.87285 68.07648 66.97512 68.22720 66.75427 67.52277 68.52634
## [97] 67.38308 67.35737 67.84445 66.98143
## [1] 0.8216888
## [1] 140.9577 148.8191 144.6930 146.9882 144.7800 144.0543 151.3254 138.0279
## [9] 145.9604 146.4166 140.3435 148.4224 139.5249 142.5249 140.6115 140.7086
## [17] 138.4230 145.1214 146.0666 143.1946 145.0178 150.7471 140.6832 138.2050
## [25] 144.4396 138.2144 143.6027 147.4786 138.1750 140.8663 146.6401 146.0239
## [33] 144.4406 145.7809 143.7467 138.2007 154.1304 145.3050 144.1590 149.0686
## [41] 148.2953 144.0047 149.9251 144.1349 148.1647 137.1428 146.1101 143.4973
## [49] 141.3555 145.0650 140.8638 149.2913 147.1552 146.7595 143.3074 147.4660
## [57] 141.0459 142.0823 146.5660 138.7607 149.5523 142.3424 146.1611 143.4733
## [65] 139.3400 143.3765 143.3385 147.6238 149.4568 147.3603 148.4950 144.6271
## [73] 142.0884 143.5700 149.0014 138.4415 147.7062 144.6965 143.4083 140.3828
## [81] 138.7293 147.7390 148.8483 145.2991 146.9725 144.9653 143.1601 139.9778
## [89] 147.0154 146.5757 143.7436 141.3445 145.9101 138.2358 143.1136 147.2740
## [97] 145.0123 140.2812 144.2747 141.2813
## [1] 3.554466
## [1] 20.09016 20.35339 19.92714 19.94374 20.83312 20.36676 20.66315 20.04407
## [9] 19.93331 19.36285 19.31068 20.22151 19.24674 20.28719 19.29950 21.24731
## [17] 19.94526 20.27997 20.66201 19.71245 20.00827 20.87650 20.29223 19.60265
## [25] 20.78528 20.14804 20.26358 20.30540 19.47730 19.86868 20.28794 20.07563
## [33] 19.45442 20.21944 20.40191 19.60320 20.91108 20.44651 20.61264 20.26201
## [41] 21.11380 19.64440 20.73244 20.68375 19.77418 19.50241 19.18708 20.26785
## [49] 18.89746 19.78174 19.83161 20.27003 20.56288 20.58653 20.13285 19.82258
## [57] 19.32622 19.19253 19.92185 19.61020 19.52189 20.91088 20.97505 20.93122
## [65] 19.08859 18.92208 20.20053 19.19859 21.38954 20.41330 19.55905 20.36850
## [73] 20.18380 20.10413 20.26552 19.66762 20.46000 19.82398 20.98094 19.98325
## [81] 20.00672 19.83595 20.06204 20.14549 19.72102 20.36216 20.36792 18.69207
## [89] 21.10751 20.35233 21.23646 19.16595 21.05086 20.13401 20.12513 21.20104
## [97] 20.15978 20.70634 19.80555 19.56325
## [1] 0.5821113
## [1] 2.533333 2.866667 2.533333 2.700000 2.900000 2.966667 3.366667 3.200000
## [9] 2.800000 2.666667 2.900000 2.766667 2.933333 3.033333 3.266667 3.266667
## [17] 2.966667 2.966667 2.666667 3.200000 3.233333 2.966667 2.900000 3.000000
## [25] 2.866667 3.300000 3.266667 2.766667 3.133333 3.000000 3.133333 3.233333
## [33] 3.000000 2.566667 3.033333 3.233333 2.966667 3.033333 3.000000 2.900000
## [41] 2.900000 3.433333 2.600000 3.133333 3.233333 3.100000 3.366667 3.566667
## [49] 2.900000 2.766667 2.566667 2.666667 2.933333 3.233333 3.600000 2.700000
## [57] 3.066667 3.300000 2.900000 3.366667 2.833333 2.966667 2.833333 2.966667
## [65] 2.633333 3.100000 2.500000 2.833333 2.833333 2.500000 2.500000 2.933333
## [73] 3.400000 2.733333 2.966667 2.633333 2.900000 3.266667 3.666667 2.933333
## [81] 3.133333 3.433333 3.100000 3.100000 2.733333 2.733333 2.733333 3.066667
## [89] 3.066667 3.000000 2.833333 2.733333 3.100000 2.633333 2.900000 2.666667
## [97] 2.866667 2.900000 2.933333 3.033333
## [1] 0.2561467
## [1] 2.600000 2.766667 2.766667 3.133333 3.133333 3.133333 2.533333 3.033333
## [9] 2.900000 2.833333 3.066667 3.133333 3.233333 3.800000 3.233333 2.600000
## [17] 3.133333 3.066667 3.033333 3.466667 2.766667 3.333333 3.033333 2.733333
## [25] 2.733333 3.066667 3.500000 3.100000 3.266667 2.800000 2.800000 3.000000
## [33] 3.033333 2.733333 3.233333 2.966667 3.200000 3.433333 3.366667 2.766667
## [41] 3.100000 2.666667 3.100000 2.566667 2.200000 3.166667 3.166667 2.766667
## [49] 3.066667 2.633333 3.266667 2.933333 3.033333 3.200000 3.233333 3.133333
## [57] 2.633333 3.000000 2.433333 3.100000 3.266667 3.266667 3.400000 3.200000
## [65] 2.833333 2.966667 2.966667 2.900000 3.133333 3.533333 2.633333 2.866667
## [73] 2.333333 3.033333 3.266667 2.800000 2.600000 2.733333 3.100000 3.033333
## [81] 3.733333 2.833333 2.933333 2.500000 2.900000 2.766667 2.333333 3.366667
## [89] 2.433333 3.233333 3.033333 2.766667 3.166667 2.966667 3.400000 3.466667
## [97] 2.366667 3.133333 2.933333 3.166667
The standard deviations are smaller from this population are
smaller than that of the one from Question 5.
par(mfrow = c(1,2))
hist(meanheights) #makes a basic histogram
qqnorm(meanheights, frame = FALSE) #plots a qq graph
qqline(meanheights) #plots qq line on the graph for analysis
Based on th histograms and qqplots, they all seem more likely to be
drawn from a normal distribution. However, mean weight is likely not
drawn from normal distributed data.